Data Loading

Using PowerQuery, I imported the data from a CSV file. After analyzing the column descriptions, I confirmed that none of them contained errors or missing values. However, I quickly noticed that the dataset includes missing days — some dates are absent from the data. Interestingly, in both datasets (alr_d, ing_d), the missing days are exactly the same.

Alior Bank company data
Alior Bank company data
ING Bank company data
ING Bank company data

Missing Data Analysis

I checked what percentage of the entire dataset is made up of missing days. If it were less than 5%, they could have been omitted. However, in our case, the missing data accounts for as much as 32%, so ignoring them could distort the entire distribution analysis.

Generated set of all dates in the analyzed period
Generated set of all dates in the analyzed period
Final dataset in Excel
Final dataset in Excel

Data Preparation

The table with the required data was saved to a CSV file. Then, in VSCode, I replaced commas , with dots . (06_change_comma_to_dot), because Excel in my region saves numbers with a comma as the decimal separator.

Separator replacement
Separator replacement

Linear Interpolation of Missing Values

The next step is to perform linear interpolation of the missing data.

Definition of Linear Interpolation

Linear interpolation is a method to estimate missing values by assuming a linear change between two known points.

Mathematically, if a value \(y\) at position \(x\) is missing between two known points \((x_0, y_0)\) and \((x_1, y_1)\), it is computed as:

\[ y = y_0 + \frac{y_1 - y_0}{x_1 - x_0} \cdot (x - x_0) \]

Example

Consider the dataset:

\[ x = (3, 7, \text{NA}, 5, 8, \text{NA}, \text{NA}, 11) \]

  1. The first missing value (NA) is between 7 (position 2) and 5 (position 4). Using the formula:

\[ y = 7 + \frac{5 - 7}{4 - 2} \cdot (3 - 2) = 7 + (-1) = 6 \]

  1. The second missing value (NA) is between 8 (position 5) and 11 (position 8). Since there are two consecutive NAs, we interpolate them linearly:
  • Position 6:

\[ y = 8 + \frac{11 - 8}{8 - 5} \cdot (6 - 5) = 8 + 1 = 9 \]

  • Position 7:

\[ y = 8 + \frac{11 - 8}{8 - 5} \cdot (7 - 5) = 8 + 2 = 10 \]

After linear interpolation, the completed dataset becomes:

\[ x = (3, 7, 6, 5, 8, 9, 10, 11) \]

Loading Data

I load the data into R and split it into the appropriate columns:

data <- read.csv(
  file = "data.csv",
  header = TRUE,
  sep = ";"
)
data
data_dates <- data$date
data_alior <- data$close_alr
data_ing <- data$close_ing

Using Build-In Functions

I use the approx function in R for linear interpolation.

The approx function works as follows:

  • It takes a vector of known points:

    • x: representing indexes of non-empty values,
    • y: representing values at indexes at x.
  • It computes values at target points (xout) through linear interpolation, i.e., it determines a point on the line connecting two neighboring known points.

  • The parameter rule = 2 ensures that values outside the range of known points are extrapolated instead of returning NA.

Below is my custom function that fills in the missing values:

linear_interpolation <- function(data) {
  na_indexes <- c()
  no_na_indexes <- c()
  no_na_values <- c()
  for (i in 1:length(data)) {
    if (is.na(data[i])) {
      na_indexes <- c(na_indexes, i)
    } else {
      no_na_indexes <- c(no_na_indexes, i)
      no_na_values <- c(no_na_values, data[i])
    }
  }
  
  interpolated_data <- approx(
    x = no_na_indexes,
    y = no_na_values,
    xout = na_indexes,
    rule = 2
  )
  
  response <- data
  response[na_indexes] <- interpolated_data$y
  
  return(response)
}

I perform the interpolation separately for each column:

data_alior_int <- linear_interpolation(data_alior)
data_ing_int <- linear_interpolation(data_ing)

Saving the interpolated data

I save the interpolated data into a CSV file so that it can be used for further analysis:

new_data <- data.frame(
  date = data_dates,
  closes_alior = data_alior_int,
  closes_ing = data_ing_int
)

write.csv(
  x = new_data,
  file = "int_data.csv",
  row.names = FALSE
)
new_data

Summary

Thanks to this step, the dataset is now complete, and the missing values have been filled using linear interpolation.
This ensures that the data is ready for further analysis of the closing price distributions.

LS0tCnRpdGxlOiAiQ2hhcHRlciAyOiBEYXRhIFByZXBhcmF0aW9uIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyBEYXRhIExvYWRpbmcKClVzaW5nICoqUG93ZXJRdWVyeSoqLCBJIGltcG9ydGVkIHRoZSBkYXRhIGZyb20gYSBDU1YgZmlsZS4gQWZ0ZXIgYW5hbHl6aW5nIHRoZSBjb2x1bW4gZGVzY3JpcHRpb25zLCBJIGNvbmZpcm1lZCB0aGF0IG5vbmUgb2YgdGhlbSBjb250YWluZWQgZXJyb3JzIG9yIG1pc3NpbmcgdmFsdWVzLiBIb3dldmVyLCBJIHF1aWNrbHkgbm90aWNlZCB0aGF0IHRoZSBkYXRhc2V0IGluY2x1ZGVzICoqbWlzc2luZyBkYXlzKiog4oCUIHNvbWUgZGF0ZXMgYXJlIGFic2VudCBmcm9tIHRoZSBkYXRhLiBJbnRlcmVzdGluZ2x5LCBpbiBib3RoIGRhdGFzZXRzIChgYWxyX2RgLCBgaW5nX2RgKSwgdGhlIG1pc3NpbmcgZGF5cyBhcmUgZXhhY3RseSB0aGUgc2FtZS4KCiFbKkFsaW9yIEJhbmsgY29tcGFueSBkYXRhKl0oLi4vc2NyZWVuc2hvdHMvMDFfYWxyLnBuZykgIAoKCiFbKklORyBCYW5rIGNvbXBhbnkgZGF0YSpdKC4uL3NjcmVlbnNob3RzLzAyX2luZy5wbmcpCgojIyBNaXNzaW5nIERhdGEgQW5hbHlzaXMKCkkgY2hlY2tlZCB3aGF0IHBlcmNlbnRhZ2Ugb2YgdGhlIGVudGlyZSBkYXRhc2V0IGlzIG1hZGUgdXAgb2YgbWlzc2luZyBkYXlzLiBJZiBpdCB3ZXJlIGxlc3MgdGhhbiA1JSwgdGhleSBjb3VsZCBoYXZlIGJlZW4gb21pdHRlZC4gSG93ZXZlciwgaW4gb3VyIGNhc2UsIHRoZSBtaXNzaW5nIGRhdGEgYWNjb3VudHMgZm9yIGFzIG11Y2ggYXMgKiozMiUqKiwgc28gaWdub3JpbmcgdGhlbSBjb3VsZCBkaXN0b3J0IHRoZSBlbnRpcmUgZGlzdHJpYnV0aW9uIGFuYWx5c2lzLgoKIVsqR2VuZXJhdGVkIHNldCBvZiBhbGwgZGF0ZXMgaW4gdGhlIGFuYWx5emVkIHBlcmlvZCpdKC4uL3NjcmVlbnNob3RzLzA0X2ltcG9ydGVkX2RhdGVzLnBuZykgIAoKCiFbKkZpbmFsIGRhdGFzZXQgaW4gRXhjZWwqXSguLi9zY3JlZW5zaG90cy8wNV9yZWFkeV9kYXRhX2V4Y2VsLnBuZykKCiMjIERhdGEgUHJlcGFyYXRpb24KClRoZSB0YWJsZSB3aXRoIHRoZSByZXF1aXJlZCBkYXRhIHdhcyBzYXZlZCB0byBhIENTViBmaWxlLiBUaGVuLCBpbiAqKlZTQ29kZSoqLCBJIHJlcGxhY2VkIGNvbW1hcyBgLGAgd2l0aCBkb3RzIGAuYCAoYDA2X2NoYW5nZV9jb21tYV90b19kb3RgKSwgYmVjYXVzZSBFeGNlbCBpbiBteSByZWdpb24gc2F2ZXMgbnVtYmVycyB3aXRoIGEgY29tbWEgYXMgdGhlIGRlY2ltYWwgc2VwYXJhdG9yLiAgCgohWypTZXBhcmF0b3IgcmVwbGFjZW1lbnQqXSguLi9zY3JlZW5zaG90cy8wNl9jaGFuZ2VfY29tbWFfdG9fZG90LnBuZykKCgojIyBMaW5lYXIgSW50ZXJwb2xhdGlvbiBvZiBNaXNzaW5nIFZhbHVlcwoKVGhlIG5leHQgc3RlcCBpcyB0byBwZXJmb3JtICoqbGluZWFyIGludGVycG9sYXRpb24qKiBvZiB0aGUgbWlzc2luZyBkYXRhLiAgCgojIyMgRGVmaW5pdGlvbiBvZiBMaW5lYXIgSW50ZXJwb2xhdGlvbgoKTGluZWFyIGludGVycG9sYXRpb24gaXMgYSBtZXRob2QgdG8gZXN0aW1hdGUgbWlzc2luZyB2YWx1ZXMgYnkgYXNzdW1pbmcgYSBsaW5lYXIgY2hhbmdlIGJldHdlZW4gdHdvIGtub3duIHBvaW50cy4gIAoKTWF0aGVtYXRpY2FsbHksIGlmIGEgdmFsdWUgXCh5XCkgYXQgcG9zaXRpb24gXCh4XCkgaXMgbWlzc2luZyBiZXR3ZWVuIHR3byBrbm93biBwb2ludHMgXCgoeF8wLCB5XzApXCkgYW5kIFwoKHhfMSwgeV8xKVwpLCBpdCBpcyBjb21wdXRlZCBhczoKClxbCnkgPSB5XzAgKyBcZnJhY3t5XzEgLSB5XzB9e3hfMSAtIHhfMH0gXGNkb3QgKHggLSB4XzApClxdCgojIyMjIEV4YW1wbGUKCkNvbnNpZGVyIHRoZSBkYXRhc2V0OgoKXFsKeCA9ICgzLCA3LCBcdGV4dHtOQX0sIDUsIDgsIFx0ZXh0e05BfSwgXHRleHR7TkF9LCAxMSkKXF0KCjEuIFRoZSBmaXJzdCBtaXNzaW5nIHZhbHVlIChgTkFgKSBpcyBiZXR3ZWVuIDcgKHBvc2l0aW9uIDIpIGFuZCA1IChwb3NpdGlvbiA0KS4gVXNpbmcgdGhlIGZvcm11bGE6CgpcWwp5ID0gNyArIFxmcmFjezUgLSA3fXs0IC0gMn0gXGNkb3QgKDMgLSAyKSA9IDcgKyAoLTEpID0gNgpcXQoKMi4gVGhlIHNlY29uZCBtaXNzaW5nIHZhbHVlIChgTkFgKSBpcyBiZXR3ZWVuIDggKHBvc2l0aW9uIDUpIGFuZCAxMSAocG9zaXRpb24gOCkuIFNpbmNlIHRoZXJlIGFyZSAqKnR3byBjb25zZWN1dGl2ZSBOQXMqKiwgd2UgaW50ZXJwb2xhdGUgdGhlbSBsaW5lYXJseToKCi0gUG9zaXRpb24gNjogCgpcWwp5ID0gOCArIFxmcmFjezExIC0gOH17OCAtIDV9IFxjZG90ICg2IC0gNSkgPSA4ICsgMSA9IDkKXF0KCi0gUG9zaXRpb24gNzogCgpcWwp5ID0gOCArIFxmcmFjezExIC0gOH17OCAtIDV9IFxjZG90ICg3IC0gNSkgPSA4ICsgMiA9IDEwClxdCgpBZnRlciBsaW5lYXIgaW50ZXJwb2xhdGlvbiwgdGhlIGNvbXBsZXRlZCBkYXRhc2V0IGJlY29tZXM6CgpcWwp4ID0gKDMsIDcsIDYsIDUsIDgsIDksIDEwLCAxMSkKXF0KCiMjIyBMb2FkaW5nIERhdGEKCkkgbG9hZCB0aGUgZGF0YSBpbnRvIFIgYW5kIHNwbGl0IGl0IGludG8gdGhlIGFwcHJvcHJpYXRlIGNvbHVtbnM6CgpgYGB7cn0KZGF0YSA8LSByZWFkLmNzdigKICBmaWxlID0gImRhdGEuY3N2IiwKICBoZWFkZXIgPSBUUlVFLAogIHNlcCA9ICI7IgopCmBgYAoKYGBge3J9CmRhdGEKYGBgCgpgYGB7cn0KZGF0YV9kYXRlcyA8LSBkYXRhJGRhdGUKZGF0YV9hbGlvciA8LSBkYXRhJGNsb3NlX2FscgpkYXRhX2luZyA8LSBkYXRhJGNsb3NlX2luZwpgYGAKCiMjIyBVc2luZyBCdWlsZC1JbiBGdW5jdGlvbnMKCkkgdXNlIHRoZSBgYXBwcm94YCBmdW5jdGlvbiBpbiBSIGZvciBsaW5lYXIgaW50ZXJwb2xhdGlvbi4KClRoZSBgYXBwcm94YCBmdW5jdGlvbiB3b3JrcyBhcyBmb2xsb3dzOgoKLSBJdCB0YWtlcyBhIHZlY3RvciBvZiBrbm93biBwb2ludHM6CiAgLSBgeGA6IHJlcHJlc2VudGluZyBpbmRleGVzIG9mIG5vbi1lbXB0eSB2YWx1ZXMsCiAgLSBgeWA6IHJlcHJlc2VudGluZyB2YWx1ZXMgYXQgaW5kZXhlcyBhdCBgeGAuCgotIEl0IGNvbXB1dGVzIHZhbHVlcyBhdCB0YXJnZXQgcG9pbnRzIChgeG91dGApIHRocm91Z2ggKipsaW5lYXIgaW50ZXJwb2xhdGlvbioqLCBpLmUuLCBpdCBkZXRlcm1pbmVzIGEgcG9pbnQgb24gdGhlIGxpbmUgY29ubmVjdGluZyB0d28gbmVpZ2hib3Jpbmcga25vd24gcG9pbnRzLiAgCgotIFRoZSBwYXJhbWV0ZXIgYHJ1bGUgPSAyYCBlbnN1cmVzIHRoYXQgdmFsdWVzIG91dHNpZGUgdGhlIHJhbmdlIG9mIGtub3duIHBvaW50cyBhcmUgKipleHRyYXBvbGF0ZWQqKiBpbnN0ZWFkIG9mIHJldHVybmluZyBOQS4gIAoKQmVsb3cgaXMgbXkgY3VzdG9tIGZ1bmN0aW9uIHRoYXQgZmlsbHMgaW4gdGhlIG1pc3NpbmcgdmFsdWVzOgoKCmBgYHtyfQpsaW5lYXJfaW50ZXJwb2xhdGlvbiA8LSBmdW5jdGlvbihkYXRhKSB7CiAgbmFfaW5kZXhlcyA8LSBjKCkKICBub19uYV9pbmRleGVzIDwtIGMoKQogIG5vX25hX3ZhbHVlcyA8LSBjKCkKICBmb3IgKGkgaW4gMTpsZW5ndGgoZGF0YSkpIHsKICAgIGlmIChpcy5uYShkYXRhW2ldKSkgewogICAgICBuYV9pbmRleGVzIDwtIGMobmFfaW5kZXhlcywgaSkKICAgIH0gZWxzZSB7CiAgICAgIG5vX25hX2luZGV4ZXMgPC0gYyhub19uYV9pbmRleGVzLCBpKQogICAgICBub19uYV92YWx1ZXMgPC0gYyhub19uYV92YWx1ZXMsIGRhdGFbaV0pCiAgICB9CiAgfQogIAogIGludGVycG9sYXRlZF9kYXRhIDwtIGFwcHJveCgKICAgIHggPSBub19uYV9pbmRleGVzLAogICAgeSA9IG5vX25hX3ZhbHVlcywKICAgIHhvdXQgPSBuYV9pbmRleGVzLAogICAgcnVsZSA9IDIKICApCiAgCiAgcmVzcG9uc2UgPC0gZGF0YQogIHJlc3BvbnNlW25hX2luZGV4ZXNdIDwtIGludGVycG9sYXRlZF9kYXRhJHkKICAKICByZXR1cm4ocmVzcG9uc2UpCn0KYGBgCgpJIHBlcmZvcm0gdGhlIGludGVycG9sYXRpb24gc2VwYXJhdGVseSBmb3IgZWFjaCBjb2x1bW46CgpgYGB7cn0KZGF0YV9hbGlvcl9pbnQgPC0gbGluZWFyX2ludGVycG9sYXRpb24oZGF0YV9hbGlvcikKZGF0YV9pbmdfaW50IDwtIGxpbmVhcl9pbnRlcnBvbGF0aW9uKGRhdGFfaW5nKQpgYGAKCiMjIyBTYXZpbmcgdGhlIGludGVycG9sYXRlZCBkYXRhCgpJIHNhdmUgdGhlIGludGVycG9sYXRlZCBkYXRhIGludG8gYSBDU1YgZmlsZSBzbyB0aGF0IGl0IGNhbiBiZSB1c2VkIGZvciBmdXJ0aGVyIGFuYWx5c2lzOgoKYGBge3J9Cm5ld19kYXRhIDwtIGRhdGEuZnJhbWUoCiAgZGF0ZSA9IGRhdGFfZGF0ZXMsCiAgY2xvc2VzX2FsaW9yID0gZGF0YV9hbGlvcl9pbnQsCiAgY2xvc2VzX2luZyA9IGRhdGFfaW5nX2ludAopCgp3cml0ZS5jc3YoCiAgeCA9IG5ld19kYXRhLAogIGZpbGUgPSAiaW50X2RhdGEuY3N2IiwKICByb3cubmFtZXMgPSBGQUxTRQopCmBgYAoKYGBge3J9Cm5ld19kYXRhCmBgYAoKIyMgU3VtbWFyeQoKVGhhbmtzIHRvIHRoaXMgc3RlcCwgdGhlIGRhdGFzZXQgaXMgbm93IGNvbXBsZXRlLCBhbmQgdGhlIG1pc3NpbmcgdmFsdWVzIGhhdmUgYmVlbiBmaWxsZWQgdXNpbmcgbGluZWFyIGludGVycG9sYXRpb24uICAKVGhpcyBlbnN1cmVzIHRoYXQgdGhlIGRhdGEgaXMgcmVhZHkgZm9yIGZ1cnRoZXIgYW5hbHlzaXMgb2YgdGhlIGNsb3NpbmcgcHJpY2UgZGlzdHJpYnV0aW9ucy4K